AITopics | aspect ratio

Collaborating Authors

aspect ratio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution

Neural Information Processing SystemsJun-22-2026, 02:07:46 GMT

While preliminary explorations have superficially investigated native resolution modeling, existing works still lack systematic training recipe from the visual representation perspective. To bridge this gap, we introduce Unified Vision Transformer with NAtive Resolution, i.e. UniViTAR, a family of homogeneous vision foundation models tailored for unified visual modality and native resolution scenario in the era of multimodal. Our framework first conducts architectural upgrades to the vanilla paradigm by integrating multiple advanced components. Building upon these improvements, a progressive training paradigm is introduced, which strategically combines two core mechanisms: (1) resolution curriculum learning, transitioning from fixedresolution pretraining to native resolution tuning, thereby leveraging ViT's inherent adaptability to variable-length sequences, and (2) visual modality adaptation via inter-batch image-video switching, which balances computational efficiency with enhanced temporal reasoning. In parallel, a hybrid training framework further synergizes sigmoid-based contrastive loss with feature distillation from a frozen teacher model, thereby accelerating early-stage convergence. Finally, trained exclusively on public accessible image-caption data, our UniViTAR family across multiple model scales from 0.3B to 1.4B achieves state-of-the-art performance on a wide variety of visual-related tasks. The code and models are available here. Figure 1: The figure presents: (left) a systematic overview of model scaling performance across downstream tasks when increasing parameter size from 0.3B to 1B, and (right) a comprehensive comparison of multimodal capabilities against SOTA baselines on diversified benchmarks.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Optimal ridge regularization revisited

Timmermans, Jack, Alvarez, Sergio A.

arXiv.org Machine LearningMay-28-2026

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.

artificial intelligence, machine learning, regularization, (16 more...)

arXiv.org Machine Learning

2605.28679

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

impacts limitations

Neural Information Processing SystemsApr-24-2026, 10:10:36 GMT

Broader Impacts NaViT enables training of vision transformers on variable size inputs, which has a profound impact on advancing adaptive computation research. By training models to handle various input size, we can explore adaptive computation techniques that dynamically adjust the computational resources based on the specific requirements of a given input. This flexibility opens up new avenues for implementing ideas that aim at adjusting allocation of compute and improving efficiency in vision tasks per input. Furthermore, NaViT computational efficiency unlocks the potential for scaling up pre-training of vision models. With the ability to handle different resolutions, models can effectively tackle more complex and diverse visual data, allowing for the development of larger and more powerful vision models.

artificial intelligence, machine learning, resolution, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Neural Information Processing SystemsApr-24-2026, 10:10:32 GMT

The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged.

artificial intelligence, machine learning, resolution, (15 more...)

Neural Information Processing Systems

Country: North America (0.46)

Industry: Health & Medicine (0.46)

Technology: